Goto

Collaborating Authors

 group 0


Enhancing Cluster Scheduling in HPC: A Continuous Transfer Learning for Real-Time Optimization

Sliwko, Leszek, Mizera-Pietraszko, Jolanta

arXiv.org Artificial Intelligence

This is the accepted version of the paper publis hed in 2025 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) . Given Name Surname line 2: dept. Given Name Surname line 2: dept. Abstract -- This study presents a machine learning - assisted approach to optimize task scheduling in cluster systems, focusing on node - affinity constraints. Traditional schedulers like Kubernetes struggle with real - time adaptability, whereas the proposed continuous transfer learning model evolves dynamically during operations, minimizing retraining needs. Evaluated on Google Cluster Data, the model achieves over 99% accuracy, reducing computational overhead and improving scheduling latency for constrained tasks. This scalable solution enables real - time optimization, advancing ma chine learning integration in cluster management and paving the way for future adaptive scheduling strategies. In the rapidly evolving landscape of cloud computing and distributed high - performance environments, the efficient management of architectural and software resources became apparently paramount for ensuring suitable performance and minimizing latency. As long as the industry organizations increasingly rely on cluster - based architectures to orchestrate their broad areas of possible applications, the importance of effective task scheduling has come to the forefront . Over the last few years, traditional schedulers, such as Kubernetes and some more, have laid the groundwork for managing containerized workloads; however, it was found that it poses a challenge for them to adapt to the dynamic nature of real - time workloads and node - affinity constraints [ 35 ] . These limitations result in inefficient resource utilization and longer scheduling delays, which ultimately affect overall system performance, especially in high - performance systems [9][18] . In mission - critical environments, these issues can escalate, disrupting vital systems like power networks, healthcare, defen s e systems, and others.


Transparency and Proportionality in Post-Processing Algorithmic Bias Correction

Ferreira, Juliett Suárez, Slavkovik, Marija, Casillas, Jorge

arXiv.org Artificial Intelligence

Algorithmic decision-making systems sometimes produce errors or skewed predictions toward a particular group, leading to unfair results. Debiasing practices, applied at different stages of the development of such systems, occasionally introduce new forms of unfairness or exacerbate existing inequalities. We focus on post-processing techniques that modify algorithmic predictions to achieve fairness in classification tasks, examining the unintended consequences of these interventions. To address this challenge, we develop a set of measures that quantify the disparity in the flips applied to the solution in the post-processing stage. The proposed measures will help practitioners: (1) assess the proportionality of the debiasing strategy used, (2) have transparency to explain the effects of the strategy in each group, and (3) based on those results, analyze the possibility of the use of some other approaches for bias mitigation or to solve the problem. We introduce a methodology for applying the proposed metrics during the post-processing stage and illustrate its practical application through an example. This example demonstrates how analyzing the proportionality of the debiasing strategy complements traditional fairness metrics, providing a deeper perspective to ensure fairer outcomes across all groups.


Fairness-aware Contextual Dynamic Pricing with Strategic Buyers

Liu, Pangpang, Sun, Will Wei

arXiv.org Machine Learning

Contextual pricing strategies are prevalent in online retailing, where the seller adjusts prices based on products' attributes and buyers' characteristics. Although such strategies can enhance seller's profits, they raise concerns about fairness when significant price disparities emerge among specific groups, such as gender or race. These disparities can lead to adverse perceptions of fairness among buyers and may even violate the law and regulation. In contrast, price differences can incentivize disadvantaged buyers to strategically manipulate their group identity to obtain a lower price. In this paper, we investigate contextual dynamic pricing with fairness constraints, taking into account buyers' strategic behaviors when their group status is private and unobservable from the seller. We propose a dynamic pricing policy that simultaneously achieves price fairness and discourages strategic behaviors. Our policy achieves an upper bound of $O(\sqrt{T}+H(T))$ regret over $T$ time horizons, where the term $H(T)$ arises from buyers' assessment of the fairness of the pricing policy based on their learned price difference. When buyers are able to learn the fairness of the price policy, this upper bound reduces to $O(\sqrt{T})$. We also prove an $\Omega(\sqrt{T})$ regret lower bound of any pricing policy under our problem setting. We support our findings with extensive experimental evidence, showcasing our policy's effectiveness. In our real data analysis, we observe the existence of price discrimination against race in the loan application even after accounting for other contextual information. Our proposed pricing policy demonstrates a significant improvement, achieving 35.06% reduction in regret compared to the benchmark policy.


Monitoring fairness in machine learning models that predict patient mortality in the ICU

van Schaik, Tempest A., Liu, Xinggang, Atallah, Louis, Badawi, Omar

arXiv.org Artificial Intelligence

Benchmarking can include comparing an ICU's actual performance with predicted performance. The increased interoperability of medical devices, electronic health records (EHRs) and information systems has improved the acquisition and presentation of data to healthcare professionals. This data has enabled the training of predictive models. However, thi s plethora of data sources has also introduced new risks that societal bias will lead to machine learning systems with fairness issues for patient groups. In addition, when variations in data documentation are non-random, significant bias can be introduced, improving, or worsening measured performance for an institution relative to peers. This work focuses on ICU mortality benchmarking. In particular, we analyze the fairness of a model based on Generalised Additiv e Models (GAM) [ 3 ] that predicts mortality in the ICU. This model is used to compare actual versus predicted outcom es to assess ICU performance.


Difficult for Whom? A Study of Japanese Lexical Complexity

Nohejl, Adam, Hayakawa, Akio, Ide, Yusuke, Watanabe, Taro

arXiv.org Artificial Intelligence

The tasks of lexical complexity prediction (LCP) and complex word identification (CWI) commonly presuppose that difficult to understand words are shared by the target population. Meanwhile, personalization methods have also been proposed to adapt models to individual needs. We verify that a recent Japanese LCP dataset is representative of its target population by partially replicating the annotation. By another reannotation we show that native Chinese speakers perceive the complexity differently due to Sino-Japanese vocabulary. To explore the possibilities of personalization, we compare competitive baselines trained on the group mean ratings and individual ratings in terms of performance for an individual. We show that the model trained on a group mean performs similarly to an individual model in the CWI task, while achieving good LCP performance for an individual is difficult. We also experiment with adapting a finetuned BERT model, which results only in marginal improvements across all settings.


SINBAD: Saliency-informed detection of breakage caused by ad blocking

Chehade, Saiid El Hajj, Siby, Sandra, Troncoso, Carmela

arXiv.org Artificial Intelligence

Privacy-enhancing blocking tools based on filter-list rules tend to break legitimate functionality. Filter-list maintainers could benefit from automated breakage detection tools that allow them to proactively fix problematic rules before deploying them to millions of users. We introduce SINBAD, an automated breakage detector that improves the accuracy over the state of the art by 20%, and is the first to detect dynamic breakage and breakage caused by style-oriented filter rules. The success of SINBAD is rooted in three innovations: (1) the use of user-reported breakage issues in forums that enable the creation of a high-quality dataset for training in which only breakage that users perceive as an issue is included; (2) the use of 'web saliency' to automatically identify user-relevant regions of a website on which to prioritize automated interactions aimed at triggering breakage; and (3) the analysis of webpages via subtrees which enables fine-grained identification of problematic filter rules.


Empowering Machines to Think Like Chemists: Unveiling Molecular Structure-Polarity Relationships with Hierarchical Symbolic Regression

Lou, Siyu, Liu, Chengchun, Chen, Yuntian, Mo, Fanyang

arXiv.org Artificial Intelligence

Thin-layer chromatography (TLC) is a crucial technique in molecular polarity analysis. Despite its importance, the interpretability of predictive models for TLC, especially those driven by artificial intelligence, remains a challenge. Current approaches, utilizing either high-dimensional molecular fingerprints or domain-knowledge-driven feature engineering, often face a dilemma between expressiveness and interpretability. To bridge this gap, we introduce Unsupervised Hierarchical Symbolic Regression (UHiSR), combining hierarchical neural networks and symbolic regression. UHiSR automatically distills chemical-intuitive polarity indices, and discovers interpretable equations that link molecular structure to chromatographic behavior.


Towards Automatic Satellite Images Captions Generation Using Large Language Models

He, Yingxu, Sun, Qiqi

arXiv.org Artificial Intelligence

Automatic image captioning is a promising technique for conveying visual information using natural language. It can benefit various tasks in satellite remote sensing, such as environmental monitoring, resource management, disaster management, etc. However, one of the main challenges in this domain is the lack of large-scale image-caption datasets, as they require a lot of human expertise and effort to create. Recent research on large language models (LLMs) has demonstrated their impressive performance in natural language understanding and generation tasks. Nonetheless, most of them cannot handle images (GPT-3.5, Falcon, Claude, etc.), while conventional captioning models pre-trained on general ground-view images often fail to produce detailed and accurate captions for aerial images (BLIP, GIT, CM3, CM3Leon, etc.). To address this problem, we propose a novel approach: Automatic Remote Sensing Image Captioning (ARSIC) to automatically collect captions for remote sensing images by guiding LLMs to describe their object annotations. We also present a benchmark model that adapts the pre-trained generative image2text model (GIT) to generate high-quality captions for remote-sensing images. Our evaluation demonstrates the effectiveness of our approach for collecting captions for remote sensing images.


Opportunities and Risks of LLMs for Scalable Deliberation with Polis

Small, Christopher T., Vendrov, Ivan, Durmus, Esin, Homaei, Hadjar, Barry, Elizabeth, Cornebise, Julien, Suzman, Ted, Ganguli, Deep, Megill, Colin

arXiv.org Artificial Intelligence

Polis is a platform that leverages machine intelligence to scale up deliberative processes. In this paper, we explore the opportunities and risks associated with applying Large Language Models (LLMs) towards challenges with facilitating, moderating and summarizing the results of Polis engagements. In particular, we demonstrate with pilot experiments using Anthropic's Claude that LLMs can indeed augment human intelligence to help more efficiently run Polis conversations. In particular, we find that summarization capabilities enable categorically new methods with immense promise to empower the public in collective meaning-making exercises. And notably, LLM context limitations have a significant impact on insight and quality of these results. However, these opportunities come with risks. We discuss some of these risks, as well as principles and techniques for characterizing and mitigating them, and the implications for other deliberative or political systems that may employ LLMs. Finally, we conclude with several open future research directions for augmenting tools like Polis with LLMs.


On the Fairness Impacts of Private Ensembles Models

Tran, Cuong, Fioretto, Ferdinando

arXiv.org Artificial Intelligence

The Private Aggregation of Teacher Ensembles (PATE) is a machine learning framework that enables the creation of private models through the combination of multiple "teacher" models and a "student" model. The student model learns to predict an output based on the voting of the teachers, and the resulting model satisfies differential privacy. PATE has been shown to be effective in creating private models in semi-supervised settings or when protecting data labels is a priority. This paper explores whether the use of PATE can result in unfairness, and demonstrates that it can lead to accuracy disparities among groups of individuals. The paper also analyzes the algorithmic and data properties that contribute to these disproportionate impacts, why these aspects are affecting different groups disproportionately, and offers recommendations for mitigating these effects